Date: Sun, 3 Mar 2024 12:55:43 +1100
> 1. And I have come up with the idea of stack swapping
This idea sounds similar but slightly different to how a kernel call
happens, where a syscall occurs it switches to the kernel stack associated
with that thread, it elevates in ring levels and is able to access pages
marked with supervisor on top of the mapped user mode pages.
If you were to model it a little closer to system calls where you have a
set of numbered routines/functions which can be called, then the program
could define those specific functions which are allowed to be called with
the raised permissions but most go via the kernel.
The entering and leaving wouldn't be too difficult to engineer however the
complexity I think would come with how to manage unique page tables for
threads, typically each process has its own page table so you would need to
duplicate it per thread, you wouldn't need to duplicate the entire thing
but atleast the top level and then anything down to the page size you want
for the protected region, the kernel doesn't need to do this as page table
entries already have user/supervisor bit flags. There are other
complexities that would come with this like page table updates needing to
be applied to multiple threads, etc. If you simplified the design down to
not every thread having its own unique page table and accessible memory but
having a privileged mode which had access to a single page table then it
would probably simplify the design. You could even take this to another
level with even more operating system support having something like Linux's
seccomp have different permissions based on if it's user mode elevated or
not.
It wouldn't make you invulnerable to things like buffer overflows, but it
would mean that you can separate your secret handling code and memory from
the rest of your user mode code, but in the end you could still do this
with some IPC method and having them in different processes (or even
machines).
The more I think about this approach you can go further and further down
the path of sandbox executables similar to what some web browsers have, if
this was something easier to architecture with the support of some library
(standard or not) then people might consider it for a design, at the moment
I suspect 99% of programmers wouldn't know where to begin.
> 2. Messing with code generation isn’t a bad idea, specially if we are
dealing with open-source applications. I can envision an additional
argument being passed to the compiler in order to randomize the layout of
where certain functions or variables are relative to each other, or change
other aspects of the resulting code to add additional fuzzing.
This sounds a little bit similar to the work Charlie Curtsinger and Emery
D. Berger at the University of Massachusetts Amherst did with Stabilizer (
https://github.com/ccurtsinger/stabilizer/ ), which randomized everything
however probably to a bigger extreme. Their intention was more to do with
performance analysis then security.
While ASLR gives overall layout randomization so you need to first find the
addresses, randomizing the text section might add a slight barrier but
probably not as much as you would think, from the perspective of a hacker
if you have a read primitive then you could still read the text segment and
extract enough to find the ROP gadgets, pwntools has DynELF
<https://docs.pwntools.com/en/stable/dynelf.html> that lets you use a read
primitive to easily parse and work with ELF files in a compremissed
processed using just a read primitive, if someone so desires they could
even attempt to dump the entire binary using a read primitive, generate
FLIRT signatures from a binary they compiled and determine where functions
are placed within the one that just acquired through the read primitive. (I
have done similar things in CTF security challenges)
Some other ideas to consider:
- Separate stack data from return address stack, doing it without an ABI
break would be hard as you would probably still need some data on the
actual stack for when there is lots of arguments and you couldn't easily
designate one register as the data stack so would require thread local
storage to maintain a pointer, and also supporting crossing boundaries
which do and don't support it.
- Have compiler warnings for any array stored on the stack, many
compilers currently generate canaries for cases where there is an array it
could be taken a step further and generate compiler warnings, arrays are
typically a source of buffer overflows
- Have the .text segment not have read permissions only execute, it
might hinder some cases where things like switch lookup tables are stored
within the .text segment but would make it harder to leak the text segment,
similar to DynELF mentioned above
- Lightweight production level asserts, so things like std::array
operator[] can terminate if it's out of bounds
Unfortunately all my emails to the sg14 list are hitting moderator
approval, so only those I'm doing reply-all to are seeing anything, that's
what I get for being a lurker the entire time.
Thanks,
James Mitchell
On Sun, 3 Mar 2024 at 08:54, Tiago Freire via SG14 <sg14_at_[hidden]>
wrote:
> I have to admit that at the beginning I was a bit skeptical in terms of
> what was being asked was either achievable or usable.
>
> However, I did some further thinking on the issue and how to combine
> certain concepts to make something useful.
>
>
>
> 1.
>
> And I have come up with the idea of stack swapping, i.e. mid execution a
> thread can swap its stack for another, given that this “another” could be a
> protected page locked to the current thread (i.e. only that thread can read
> it).
>
> Assuming that restricting pages to specific threads is possible.
>
>
>
> The idea being that access can never be changed. When the application
> needs to do something sensitive, it will swap its stack to this special
> protected one, do all of its cryptography there, return, swap the stack
> back, zero out the special stack before returning it to the system.
>
> You could even use existing cryptographic libraries to keep it safe as
> long as they do everything on the stack, they wouldn’t be able to tell that
> they were running on a special stack. If they happen to require heap
> allocation then that will of course be leakable, but you can fix that by
> providing a special allocator where the pages are locked to the running
> thread.
>
>
>
> As long as this part of the code is done correctly (which can be made to
> have a small testable surface), this kind of system would be invulnerable
> to overflow attacks. And even if you managed to get remote execution to
> work on some other part of the code you may not get that far, as there
> would be no facility available to unlock the page assigned to another
> thread, unless you can:
>
> a) swap the running context of the thread that currently has privileged
> access to the memory at the right time. This is a higher bar to achieve.
>
> b) get a root kit to gain the OS privileged access. At that point the
> entire system is screwed, and this type of protection wouldn’t make much of
> a difference.
>
>
>
> As a bonus point, if you happen to leak this privileged memory, the system
> would be able to reclaim it back when the thread exits, or with a facility
> specially crafted to clear it.
>
> This wouldn’t need to affect code generation of current applications given
> that you would need to explicitly opt-in by using special functions.
>
>
>
> 2.
>
> Messing with code generation isn’t a bad idea, specially if we are dealing
> with open-source applications. I can envision an additional argument being
> passed to the compiler in order to randomize the layout of where certain
> functions or variables are relative to each other, or change other aspects
> of the resulting code to add additional fuzzing.
>
> The side effect of which would be, even if someone managed to replicate
> the exact build environment, and figure out the exact version of the
> application that is running, and they have an exploit where they could
> target a specific place in memory, they wouldn’t have access to the exact
> build and would be much harder to figure out the code layout to make the
> exploit work.
>
> Sure, this will have some impact on the predictability of the run time
> performance because of how the instruction appear in cache, but if what you
> are trying to do is protect data, predictable performance is not as high a
> priority.
>
>
>
> 3.
>
> There’s always the lose point of how the cryptographic keys/credit card
> secrets end up in the application to begin with. As James hinted at, it
> seems like a bad idea that the user facing application that is subject to
> attacks and exploits from external malicious actors is also the application
> that has direct access to your passwords. If you have a separate
> application who’s only responsibility is to manage the secrets, and it can
> do it right, then the issue isn’t as much of a problem, this is not to say
> that this sort of memory protection isn’t useful, and protecting your
> one-time usable tokens isn’t worth doing, but perhaps may be less important
> if better security practices were adopted instead. There’s no magic
> solution that can save anyone if the developer just does “stupid shit”, and
> a minimum level of competence is required.
>
> And I’m not sure if adopting better security standards is more productive.
>
>
>
> In summary.
>
> In any case it seems to me there is indeed a great deal of something that
> can actually be done, and definitely worth researching. But most of it
> involves either hardware or operating system design, this could benefit all
> programming languages that can be compiled into byte code, not just C++.
> The role of C++ would only be to standardize the API’s to make it available
> to the user. *But these facilities will need to be created first outside
> of the C++ standard before the committee could do anything about it.*
>
>
>
> It is an interesting point of discussion; somebody should do research on
> this topic; maybe it will become standard practice in the future. But the
> C++ committee may not be the right venue.
>
>
>
> Br,
>
>
>
>
>
>
>
> *From:* Tiago Freire <tmiguelf_at_[hidden]>
> *Sent:* Saturday, March 2, 2024 9:04 AM
> *To:* Robin Rowe <robin.rowe_at_[hidden]>; sg14_at_[hidden]
> *Cc:* undefined-behavior-study-group_at_[hidden]
> *Subject:* Re: [SG14] Memory Safety and Page Protected Memory
>
>
>
> I agree it doesn't have to be full proof in order to work.
>
> And an answer could be all of the above.
>
>
>
> Disconnected heap spaces
>
> memory locks
>
> memory scrubbers
>
> safer designs to interact with sensitive data
>
>
>
> they all do something, even if not perfect if at least can frustrate
> attacks to be statistically impractical for 50% of applications, we have
> still made things safer.
>
>
>
> As long as it is understood that safer doesn't mean perfectly safe, I
> think we do have some points of actions that can be researched on and that
> can become reality.
>
>
>
> _______________________________________________
> SG14 mailing list
> SG14_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg14
>
This idea sounds similar but slightly different to how a kernel call
happens, where a syscall occurs it switches to the kernel stack associated
with that thread, it elevates in ring levels and is able to access pages
marked with supervisor on top of the mapped user mode pages.
If you were to model it a little closer to system calls where you have a
set of numbered routines/functions which can be called, then the program
could define those specific functions which are allowed to be called with
the raised permissions but most go via the kernel.
The entering and leaving wouldn't be too difficult to engineer however the
complexity I think would come with how to manage unique page tables for
threads, typically each process has its own page table so you would need to
duplicate it per thread, you wouldn't need to duplicate the entire thing
but atleast the top level and then anything down to the page size you want
for the protected region, the kernel doesn't need to do this as page table
entries already have user/supervisor bit flags. There are other
complexities that would come with this like page table updates needing to
be applied to multiple threads, etc. If you simplified the design down to
not every thread having its own unique page table and accessible memory but
having a privileged mode which had access to a single page table then it
would probably simplify the design. You could even take this to another
level with even more operating system support having something like Linux's
seccomp have different permissions based on if it's user mode elevated or
not.
It wouldn't make you invulnerable to things like buffer overflows, but it
would mean that you can separate your secret handling code and memory from
the rest of your user mode code, but in the end you could still do this
with some IPC method and having them in different processes (or even
machines).
The more I think about this approach you can go further and further down
the path of sandbox executables similar to what some web browsers have, if
this was something easier to architecture with the support of some library
(standard or not) then people might consider it for a design, at the moment
I suspect 99% of programmers wouldn't know where to begin.
> 2. Messing with code generation isn’t a bad idea, specially if we are
dealing with open-source applications. I can envision an additional
argument being passed to the compiler in order to randomize the layout of
where certain functions or variables are relative to each other, or change
other aspects of the resulting code to add additional fuzzing.
This sounds a little bit similar to the work Charlie Curtsinger and Emery
D. Berger at the University of Massachusetts Amherst did with Stabilizer (
https://github.com/ccurtsinger/stabilizer/ ), which randomized everything
however probably to a bigger extreme. Their intention was more to do with
performance analysis then security.
While ASLR gives overall layout randomization so you need to first find the
addresses, randomizing the text section might add a slight barrier but
probably not as much as you would think, from the perspective of a hacker
if you have a read primitive then you could still read the text segment and
extract enough to find the ROP gadgets, pwntools has DynELF
<https://docs.pwntools.com/en/stable/dynelf.html> that lets you use a read
primitive to easily parse and work with ELF files in a compremissed
processed using just a read primitive, if someone so desires they could
even attempt to dump the entire binary using a read primitive, generate
FLIRT signatures from a binary they compiled and determine where functions
are placed within the one that just acquired through the read primitive. (I
have done similar things in CTF security challenges)
Some other ideas to consider:
- Separate stack data from return address stack, doing it without an ABI
break would be hard as you would probably still need some data on the
actual stack for when there is lots of arguments and you couldn't easily
designate one register as the data stack so would require thread local
storage to maintain a pointer, and also supporting crossing boundaries
which do and don't support it.
- Have compiler warnings for any array stored on the stack, many
compilers currently generate canaries for cases where there is an array it
could be taken a step further and generate compiler warnings, arrays are
typically a source of buffer overflows
- Have the .text segment not have read permissions only execute, it
might hinder some cases where things like switch lookup tables are stored
within the .text segment but would make it harder to leak the text segment,
similar to DynELF mentioned above
- Lightweight production level asserts, so things like std::array
operator[] can terminate if it's out of bounds
Unfortunately all my emails to the sg14 list are hitting moderator
approval, so only those I'm doing reply-all to are seeing anything, that's
what I get for being a lurker the entire time.
Thanks,
James Mitchell
On Sun, 3 Mar 2024 at 08:54, Tiago Freire via SG14 <sg14_at_[hidden]>
wrote:
> I have to admit that at the beginning I was a bit skeptical in terms of
> what was being asked was either achievable or usable.
>
> However, I did some further thinking on the issue and how to combine
> certain concepts to make something useful.
>
>
>
> 1.
>
> And I have come up with the idea of stack swapping, i.e. mid execution a
> thread can swap its stack for another, given that this “another” could be a
> protected page locked to the current thread (i.e. only that thread can read
> it).
>
> Assuming that restricting pages to specific threads is possible.
>
>
>
> The idea being that access can never be changed. When the application
> needs to do something sensitive, it will swap its stack to this special
> protected one, do all of its cryptography there, return, swap the stack
> back, zero out the special stack before returning it to the system.
>
> You could even use existing cryptographic libraries to keep it safe as
> long as they do everything on the stack, they wouldn’t be able to tell that
> they were running on a special stack. If they happen to require heap
> allocation then that will of course be leakable, but you can fix that by
> providing a special allocator where the pages are locked to the running
> thread.
>
>
>
> As long as this part of the code is done correctly (which can be made to
> have a small testable surface), this kind of system would be invulnerable
> to overflow attacks. And even if you managed to get remote execution to
> work on some other part of the code you may not get that far, as there
> would be no facility available to unlock the page assigned to another
> thread, unless you can:
>
> a) swap the running context of the thread that currently has privileged
> access to the memory at the right time. This is a higher bar to achieve.
>
> b) get a root kit to gain the OS privileged access. At that point the
> entire system is screwed, and this type of protection wouldn’t make much of
> a difference.
>
>
>
> As a bonus point, if you happen to leak this privileged memory, the system
> would be able to reclaim it back when the thread exits, or with a facility
> specially crafted to clear it.
>
> This wouldn’t need to affect code generation of current applications given
> that you would need to explicitly opt-in by using special functions.
>
>
>
> 2.
>
> Messing with code generation isn’t a bad idea, specially if we are dealing
> with open-source applications. I can envision an additional argument being
> passed to the compiler in order to randomize the layout of where certain
> functions or variables are relative to each other, or change other aspects
> of the resulting code to add additional fuzzing.
>
> The side effect of which would be, even if someone managed to replicate
> the exact build environment, and figure out the exact version of the
> application that is running, and they have an exploit where they could
> target a specific place in memory, they wouldn’t have access to the exact
> build and would be much harder to figure out the code layout to make the
> exploit work.
>
> Sure, this will have some impact on the predictability of the run time
> performance because of how the instruction appear in cache, but if what you
> are trying to do is protect data, predictable performance is not as high a
> priority.
>
>
>
> 3.
>
> There’s always the lose point of how the cryptographic keys/credit card
> secrets end up in the application to begin with. As James hinted at, it
> seems like a bad idea that the user facing application that is subject to
> attacks and exploits from external malicious actors is also the application
> that has direct access to your passwords. If you have a separate
> application who’s only responsibility is to manage the secrets, and it can
> do it right, then the issue isn’t as much of a problem, this is not to say
> that this sort of memory protection isn’t useful, and protecting your
> one-time usable tokens isn’t worth doing, but perhaps may be less important
> if better security practices were adopted instead. There’s no magic
> solution that can save anyone if the developer just does “stupid shit”, and
> a minimum level of competence is required.
>
> And I’m not sure if adopting better security standards is more productive.
>
>
>
> In summary.
>
> In any case it seems to me there is indeed a great deal of something that
> can actually be done, and definitely worth researching. But most of it
> involves either hardware or operating system design, this could benefit all
> programming languages that can be compiled into byte code, not just C++.
> The role of C++ would only be to standardize the API’s to make it available
> to the user. *But these facilities will need to be created first outside
> of the C++ standard before the committee could do anything about it.*
>
>
>
> It is an interesting point of discussion; somebody should do research on
> this topic; maybe it will become standard practice in the future. But the
> C++ committee may not be the right venue.
>
>
>
> Br,
>
>
>
>
>
>
>
> *From:* Tiago Freire <tmiguelf_at_[hidden]>
> *Sent:* Saturday, March 2, 2024 9:04 AM
> *To:* Robin Rowe <robin.rowe_at_[hidden]>; sg14_at_[hidden]
> *Cc:* undefined-behavior-study-group_at_[hidden]
> *Subject:* Re: [SG14] Memory Safety and Page Protected Memory
>
>
>
> I agree it doesn't have to be full proof in order to work.
>
> And an answer could be all of the above.
>
>
>
> Disconnected heap spaces
>
> memory locks
>
> memory scrubbers
>
> safer designs to interact with sensitive data
>
>
>
> they all do something, even if not perfect if at least can frustrate
> attacks to be statistically impractical for 50% of applications, we have
> still made things safer.
>
>
>
> As long as it is understood that safer doesn't mean perfectly safe, I
> think we do have some points of actions that can be researched on and that
> can become reality.
>
>
>
> _______________________________________________
> SG14 mailing list
> SG14_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg14
>
Received on 2024-03-03 01:55:56